Storytelling of Photo Stream with Bidirectional Multi-thread Recurrent Neural Network
نویسندگان
چکیده
Visual storytelling aims to generate human-level narrative language (i.e., a natural paragraph with multiple sentences) from a photo streams. A typical photo story consists of a global timeline with multi-thread local storylines, where each storyline occurs in one different scene. Such complex structure leads to large content gaps at scene transitions between consecutive photos. Most existing image/video captioning methods can only achieve limited performance, because the units in traditional recurrent neural networks (RNN) tend to “forget” the previous state when the visual sequence is inconsistent. In this paper, we propose a novel visual storytelling approach with Bidirectional Multi-thread Recurrent Neural Network (BMRNN). First, based on the mined local storylines, a skip gated recurrent unit (sGRU) with delay control is proposed to maintain longer range visual information. Second, by using sGRU as basic units, the BMRNN is trained to align the local storylines into the global sequential timeline. Third, a new training scheme with a storyline-constrained objective function is proposed by jointly considering both global and local matches. Experiments on three standard storytelling datasets show that the BMRNN model outperforms the state-of-the-art methods.
منابع مشابه
Show, Reward and Tell: Automatic Generation of Narrative Paragraph from Photo Stream by Adversarial Training
Impressive image captioning results (i.e., an objective description for an image) are achieved with plenty of training pairs. In this paper, we take one step further to investigate the creation of narrative paragraph for a photo stream. This task is even more challenging due to the difficulty in modeling an ordered photo sequence and in generating a relevant paragraph with expressive language s...
متن کاملMulti-Step-Ahead Prediction of Stock Price Using a New Architecture of Neural Networks
Modelling and forecasting Stock market is a challenging task for economists and engineers since it has a dynamic structure and nonlinear characteristic. This nonlinearity affects the efficiency of the price characteristics. Using an Artificial Neural Network (ANN) is a proper way to model this nonlinearity and it has been used successfully in one-step-ahead and multi-step-ahead prediction of di...
متن کاملExpressing an Image Stream with a Sequence of Natural Sentences
We propose an approach for retrieving a sequence of natural sentences for an image stream. Since general users often take a series of pictures on their special moments, it would better take into consideration of the whole image stream to produce natural language descriptions. While almost all previous studies have dealt with the relation between a single image and a single natural sentence, our...
متن کاملTowards Online-Recognition with Deep Bidirectional LSTM Acoustic Models
Online-Recognition requires the acoustic model to provide posterior probabilities after a limited time delay given the online input audio data. This necessitates unidirectional modeling and the standard solution is to use unidirectional long short-term memory (LSTM) recurrent neural networks (RNN) or feedforward neural networks (FFNN). It is known that bidirectional LSTMs are more powerful and ...
متن کاملJoint Learning of Correlated Sequence Labelling Tasks Using Bidirectional Recurrent Neural Networks
The stream of words produced by Automatic Speech Recognition (ASR) systems is devoid of any punctuations and formatting. Most natural language processing applications usually expect segmented and well-formatted texts as input, which is not available in ASR output. This paper proposes a novel technique of jointly modelling multiple correlated tasks such as punctuation and capitalization using bi...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1606.00625 شماره
صفحات -
تاریخ انتشار 2016